Presentation: Tweet"Beyond MapReduce"
Apache Hadoop is the current darling of the "Big Data" world. At its core is the MapReduce computing model for decomposing large data-analysis jobs into smaller tasks and distributing those tasks around a cluster. MapReduce itself was pioneered at Google for indexing the Web and other computations over massive data sets.
The strengths of MapReduce are cost-effective scalability and relative maturity. Its weaknesses are its batch orientation, making it unsuitable for real-time event processing, and the difficulty of implementing data analysis idioms in the MapReduce computing model.
We can address the weaknesses in several ways. First, higher-level programming languages, which provide common query and manipulation abstractions, make it easier to implement MapReduce programs. However, longer term, we need new distributed computing models that are more flexible for different problems and which provide better real-time performance.
We'll review these strengths and weaknesses of MapReduce and the Hadoop implementation, then discuss several emerging alternatives, such as Google's Pregel system for graph processing and Storm for event processing. We'll finish with some speculation about the longer-term future of Big Data.